Heart failure classification using deep learning to extract spatiotemporal features from ECG

Background Heart failure is a syndrome with complex clinical manifestations. Due to increasing population aging, heart failure has become a major medical problem worldwide. In this study, we used the MIMIC-III public database to extract the temporal and spatial characteristics of electrocardiogram (ECG) signals from patients with heart failure. Methods We developed a NYHA functional classification model for heart failure based on a deep learning method. We introduced an integrating attention mechanism based on the CNN-LSTM-SE model, segmenting the ECG signal into 2 to 20 s long segments. Ablation experiments showed that the 12 s ECG signal segments could be used with the proposed deep learning model for superior classification of heart failure. Results The accuracy, positive predictive value, sensitivity, and specificity of the NYHA functional classification method were 99.09, 98.9855, 99.033, and 99.649%, respectively. Conclusions The comprehensive performance of this model exceeds similar methods and can be used to assist in clinical medical diagnoses.


Introduction
Heart failure is a syndrome with complex clinical manifestations.It can occur for a variety of reasons, including structural damage to the heart and changes in its function that prevent it from pumping blood to the body correctly, leaving the body without full circulation.As our population ages, the number of patients with heart failure increases yearly, with repeated hospitalization, reduced quality of life, and other problems.These problems highlight the need for timely diagnosis, treatment, and prognosis.Estimating the severity of patients with heart failure through its classification has important clinical significance in effective treatment.
Classifying heart failure is considered the most crucial step in treating it.The standard for classifying heart failure severity is the New York Heart Association (NYHA) functional classification [1], which pays attention to patients' exercise habits and disease symptoms.NYHA Class I indicates that the patient with heart disease is physically active.NYHA Class II indicates the patient is somewhat limited in physical activity, engages in daily activities, but has begun to experience structural changes in the heart.NYHA Class III indicates the patient is significantly limited in physical activity, engages in little daily activity, and has significant structural changes in the heart.NYHA Class IV indicates that the patient cannot do any physical activity and has a considerable structural change in the heart.The electrocardiogram (ECG) is used to monitor heart health by detecting the heart's change, which can provide a clinical reference to physicians simply and intuitively [2].There are many differences between the ECG signals (ECGs) from patients with heart failure and ordinarily healthy people.The grading of heart failure requires careful study of ECG recordings by experienced cardiologists, a process that is tedious and time-consuming.In addition, there may be small changes in the ECG that are ignored by the naked eye.Therefore, computer-aided diagnosis (CAD) algorithms [3] can be used to improve the accuracy of diagnosis.CAD uses machine learning [4] and deep learning methods to diagnose and analyze diseases from large-scale electronic medical data [5,6].For example, Balasubramanian et al. [7] used a method by combining convolutional neural network and support vector machine to segment retinal blood vessels.CAD can provide valuable reference results for medical personnel, reduce the workload of doctors, and help to reduce the occurrence of misdiagnosis to a certain extent.
Many researchers have used ECGs to study the classifications of heart failure.Tripoliti et al. [8] dealt with the severity of heart failure as a second-, third-, and fourthlevel classification problem.Eleven classifiers were used on a heart failure dataset of 378 patients via 10-fold cross-validation and evaluated.The highest detection accuracy for the secondary, tertiary, and quaternary classification problems was 97, 87, and 67%, respectively.Zhang et al. [9] constructed datasets of patients with heart failure.Natural language processing (NLP) was used according to the relevant data on NYHA classification to classify patients with heart failure from clinical data (NYHA Classes I-IV).Qu et al. [10] extracted multiple features from the heart rate variability (HRV) of patients with heart failure.Support vector machine (SVM) and classification and regression tree (CART) were used to distinguish patients with heart failure with NYHA class I-III according to extracted features.The accuracy, sensitivity, and specificity of the SVM classifier reached 84.0, 71.2, and 83.4%, respectively, while the accuracy, sensitivity, and specificity of the CART classifier reached 81.4,66.5, and 81.6%, respectively.Li et al. [11] proposed a deep convolutional neural network recursive neural network (CNN-RNN) model for real-time automatic classification of heart failure.Features of ECGs were extracted and combined with other clinical features.The combined features were provided to the RNN for classification, resulting in five classification results (typical and NYHA Classes I-IV).The proposed CNN-RNN model has a classification accuracy of 97.6%, sensitivity of 96.3%, and specificity of 97.4%.Li et al. [12] divided ECGs into 2 s segments and proposed a new multi-scale residual network (ResNet) to distinguish heart failure patients with different NYHA classes (NYHA Classes I-IV).The experimental results showed that the average positive predictive value, sensitivity, and accuracy of the proposed ResNet-34 were 93.49, 93.44, and 93.60%, respectively.D' Addio et al. [13] extracted features from Poincaré plot,which was generated from 24 h ECG recordings.They used machine learning algorithms to distinguish heart failure patients with different NYHA classes (NYHA Classes I-III).The machine learning algorithms used by the author included AdaBoost, k-Nearest neighbors (KNN), and naive Bayes (NB).The accuracy of the three algorithms was greater than 80%, and the area under the receiver operating curve was greater than 0.7.Sandhu et al. [14] analyzed 13 clinical medical data records on 299 patients with heart failure and classified these patients as NYHA Class III or IV.The SVM-GA model was proposed to classify the grade of patients with heart failure and calculate the importance of features.The accuracy, positive predictive value, and recall of the proposed SVM-GA model were 91.49, 94.25, and 93.6%, respectively.Tsai and Morshed [15] used BIDMC congestive heart failure (CHF) datasets, including the ECG of NYHA Class III and IV patients.Twenty-eight features were extracted from the ECG data.Machine learning models (including SVM, KNN, ensemble tree, decision tree, naive Bayes, and logistic regression) were used to realize automatic real-time, high-precision classification of patients.KNN was the most accurate, with 99.4% accuracy; the accuracy of SVM, ensemble tree, decision tree, naive Bayes, and logistic regression was 99.4,98.2, 99.4,98.7, and 99.2%, respectively.
The above studies showed that the severity of heart failure is primarily based on the NYHA classification standard.In comparison, few studies classified heart failure into four categories.Zhang et al. [9] and Sandhu et al. [14] used the patients' medical data as the datasets, and D' Addio et al. [13] used the Poincaré chart as their experimental data.ECG or HRV [16] was used as experimental data in other literatures [8,[10][11][12]15].This demonstrates that many kinds of computer data are used in the research of heart failure grading and that there is no universal automatic assessment model of heart failure yet.Therefore,we studied an objective and convenient heart failure classification model, which only uses ECGs to evaluate the severity of heart failure.Our model is essentially a multi-classification task, and the framework of our model is shown in Fig. 1.The model can classify the severity of heart failure of patients, and the higher the NYHA grade represents the higher the severity of heart failure.The specific details about the proposed deep learning model of Fig. 1 are elaborated in Section III.
The main contributions in this paper are as follows:   grades are shown in Fig. 2, the abscissa represents the sampling point and the ordinate represents the amplitude of the ECG.Not every patient in the waveform datasets had ECG recordings, so there was an imbalance in the distribution of the datasets.To solve the problem of unbalanced data distribution, we adopted the method of setting initial weights, dividing the training set, and test set according to the data distribution proportions, and employing cross-validation [18].

Pre-processing
The data used in this study included 30 min lead II ECGs of patients with different heart failure grades, which needed to be segmented before they were entered into a deep learning network.The sampling frequency of the original ECG signal was 125 Hz.We used the original sampling frequency and recorded the whole ECG signal in segments of 2-20 s.Some studies indicate that irregular R-R intervals may indicate cardiac functional abnormalities [19].To ensure that the proposed deep learning model captures information from continuous wave peaks, we performed R-peak detection on ECG segments of different durations for data preprocessing [19].Segments without at least 2 R-peaks were excluded, ensuring that each segment contained at least two complete QRS waves.The algorithm involves dynamic threshold computation, peak detection, sliding window, and QRS wave validation.Figure 3 illustrates the R-peak detection results for 2-second and 3-second ECG segments, showing that Fig. 3(1) contains two complete QRS waves, while Fig. 3(2) contains four complete QRS waves.Similar results can be obtained for other durations in Table 2. Results for other durations are not presented here for brevity.
The amounts of data after performing R-peak detection for data cleaning on ECG segments of different durations are presented in Table 2. Thirty minutes of ECGs could not be evenly segmented by 7, 11, 13, 14, 16, 17, and 19 s intervals, so they were excluded.We modeled and tested the datasets of the remaining ECG recordings to find the partitioning with the best effect.
Finally, to speed up the optimal gradient descent solution [20], we conducted Z-score standardization processing on the datasets.The formula is as follows: (1) where x ′ represents the normalized ECG segments, x i is the sampled ECG signal, μ is the mean, and σ is the variance of the population data.

One-dimensional convolutional neural networks
Convolutional neural network (CNN) is feedforward neural network with deep structure, convolution calculation, and a representative deep learning algorithm [21].The study of CNN began in the 1980s, LeNet-5 being one of the earliest [22].After improved deep learning theory and Fig. 3 The results of R-peak detection computing equipment were introduced in the 2000s, CNN developed rapidly and were applied to computer vision, natural language processing, and other fields.Since the ECG datasets in this study are one-dimensional, unlike the two-dimensional image input to a standard CNN, we used a one-dimensional CNN for better results [11].
A one-dimensional CNN includes a one-dimensional convolution layer, a pooling layer, and a fully connected layer [21].A one-dimensional CNN learns the spatial features of data automatically without artificial feature selection.Therefore, we used the CNN as a feature extractor.An ECG signal contains strong temporal characteristics, and a simple CNN cannot extract the features of temporal signals well.It must be combined with other deep learning networks that are good at processing temporal signals.
This study used a nine-layer deep CNN, including three one-dimensional convolution layers, three pooling layers, and three full connection layers.Adding a pooling layer behind the convolution layer reduces the feature map's size, and the full connection layer outputs features for the final classification task.

Long short-term memory
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) that is often used to predict information containing time sequences [23].RNN is connected to evaluating the current information based on the previous period's data, so it performs well in predicting timing problems.However, an RNN is prone to gradient disappearance with increased network layers.Based on RNN, LSTM increased the screening of memory information, retained useful information for the model, and solved the RNN problem of gradient disappearance and explosion [24].
Figure 4 shows the internal structure of an LSTM memory block.C t and C t − 1 are the neuronal states of the current moment and the previous moment, respectively.h t and h t − 1 are respectively the output of the unit at the current time and the unit at the previous time, and X t is the input to the network.The LSTM forget gate is f t , which controls forgotten information through the sigmoid function.i t is the input gate, which sets the threshold value and implements the tanh function to determine the state of the neuron.O t is the output gate, which controls the output information through the sigmoid function.The formulas are as follows: (2)

Channel attention module
A problem arises when training a neural network.With the deepening of network layers, the final classification effect decreases instead of increasing, and even the accuracy of the training set stagnates.This happens because although increasing the network layers may obtain deeper features, the network cannot select these features well.We integrate a channel attention mechanism into a CNN to amplify the features of a particular part while ignoring irrelevant features and fully using the existing convolutional layer without increasing the depth of the network.
The squeeze-and-excitation network (SE-Net) [25] is a channel attention mechanism.It is a new image recognition structure unveiled by autonomous driving company Momenta in 2017.The modeling of the correlation between feature channels is the excitation network.The central ideas of SE-Net are to learn feature weights through the network according to a loss function, to enlarge the effective feature map weight, and to reduce invalid or small-effect feature map weights for better results.The internal structure of SE-Net is shown in Fig. 5.The first step of SE-Net is to change the elements in each channel into scalars through global average pooling, called Squeeze operation.The second step is to pass the scalar value through the two fully connected (FC) layers to obtain a weight between 0 and 1.The process obtains the new feature map by multiplying each element (6) of the original H × W by the weight of the corresponding channel.This step is called excitation.Finally, channelby-channel weighting recalibrates the original features in the channel dimension.
We added the SE-block after the second and third convolution layers of the CNN to automatically select related features and ignore irrelevant ones, resulting in a better classification of heart failure.

CNN-LSTM-SE model integrating attention mechanism
The structure of our proposed CNN-LSTM-SE model with an integrated attention mechanism is shown in Fig. 6.We performed an ablation experiment [26] to determine the optimal network structure proposed in this paper.The proposed network contains 20 layers which includes 3 convolutional layers, 2 SE-Blocks, 10 LSTM layers, 3 global average pooling layers, and 2 fully connected (FC) dense layers.First, one-dimensional CNN was used to extract the spatial features of ECGs.Second, the LSTM layer was added before the FC layer of the CNN to make the model learn the sequential characteristics of the ECGs.Finally, the attention mechanism SE-block was added behind the second and third convolution layers of the CNN-LSTM model to realize automatic focusing of the relevant features and to ignore irrelevant features.
From one-dimensional CNN model to the CNN-LSTM model and finally to the CNN-LSTM-SE model, the accuracy, specificity, sensitivity, and positive predictive value were successively improved.The CNN-LSTM-SE model provided the best results, which shows that the integration of LSTM and attention mechanism in onedimensional CNN model can improve the effect of heart failure classification.The test results of three models are described in Section V.

Implementation details
The software environment for this experiment was Ten-sorflow2.3.0 and Python 3.8, and the hardware environment was an NVIDIA GeForce GTX 1060.

Fig. 5 Internal structure of SE-Net
A five-fold cross-validation method was adopted to evaluate the robustness of the proposed model [27].This method divided the datasets randomly into five parts, four of which were trained and one tested.The cycle was repeated five times to build five models.Datasets divided into 2-20 s segments were modeled separately.Twelve modeling test results are described in Section V.The evaluation indexes of each fold were accuracy, sensitivity, and specificity.Finally, the accuracy, sensitivity, specificity, and positive predictive value of the five models were averaged to get the final evaluation index results.The average training time for each model is 226 seconds, and the total training time for five-fold cross-validation is 18 minutes.The average time taken for model testing is 0.65 seconds.
We chose the Adam optimizer with backpropagation, set the learning rate of 0.001 for each round of training fold, trained for 60 epochs, and set the maximum mass size to 32.

Results and discussion
For unbalanced samples, using only accuracy did not help to comprehensively evaluate the model's performance.Therefore, four objective standard indexes were used to evaluate the classification performance of the proposed mode: accuracy (Acc), positive predictive value (PPV), specificity (Spe), and sensitivity (Sen).Acc, PPV, Spe, and Sen are defined as follows (true positive [TP], false positive [FP], true negative [TN], and false negative [FN] are used in the formula): Acc refers to the percentage of predicted correct results of the total samples: PPV refers to the probability of actual positive samples among all predicted positive samples: Spe refers to the probability of being predicted as a negative sample in the actual negative samples: Sen refers to the probability of being predicted as a positive sample in the actual positive sample: We adopted two kinds of schemes in the training.Scheme A is a trained network without any dropout and is introduced as reference to examine the effect between a regular network and dropout network.The other is dropout scheme.In Scheme B, 20% of the recurrent and input connections of the LSTM layer are dropped out.The accuracy and loss curves for each of these schemes are presented in Fig. 7.It can be observed from Fig. 7 that the dropout network has little fluctuation in the accuracy curve compared to the regular network.Both the validation curve and the training curve steadily increase and eventually stabilize at around 99% at 60 epochs.The validation set loss curve of the conventional network oscillates significantly.At 60 epochs, the accuracy of the training set stabilizes at 99%, while the accuracy of the validation set is 98%.The accuracy of the validation set of the Scheme A is 1% lower than that of the Scheme B.
The test results of three models (CNN, CNN-LSTM, CNN-LSTM-SE) generated by the ablation experiment are shown in Table 3.The datasets used were patients' ECGs   segmentation methods, this model (12 s segments) has the highest accuracy, positive predictive value, specificity, and third-highest sensitivity.The sensitivity of the model divided by 12 s sementation is 0.001% lower than that divided by 9 s segmentation (ranking second), and 0.077% lower than that divided by 15 s segmentation (ranking first).The sensitivity of the model divided by 12 s segmentation is almost equal to that of the second best.Therefore, the proposed CNN-LSTM-SE model has the best comprehensive performance when the datasets are divided into one segment every 12 s.
The confusion matrixes of the CNN-LSTM-SE model divided into 12 s segments are shown in Fig. 8.As shown in Fig. 8, the model is more likely to confuse all grades of heart failure with those of neighboring grades, and less likely to confuse those of different grades.For example, in Fig. 8(1), 16 patients with NYHA Class III heart failure were misclassified as NYHA Class II, 15 cases were misclassified as NYHA Class IV, and only 1 case was misclassified as NYHA Class I.In Fig. 8(5), 16 patients with NYHA Class IV heart failure were misclassified as NYHA Class III and only 1 was misclassified as NYHA Class II.This suggests that there is greater similarity between adjacent grades of heart failure ECGs than that of different grades, making the models difficult to distinguish.
The model test results for the five-fold cross-validation are shown in Table 5.Table 5 shows that, except for the third fold model, the Acc is 98.76%, and the classification effect is slightly poor.The Acc of the other-fold heart failure grade classification models is above 99%.The average PPV was 98.9855%, close to 99%, the average Sen was 99.033%, and the average Spe was 99.649%, close to 100%.It indicates that the model divided by 12 s segmentation is relatively excellent in all indicators.
To further verify the performance of the proposed CNN-LSTM-SE model, we tested the performance of our model on two other datasets (Data-sets A and B).The Data-set A were obtained from public datasets (Phys-ioBank) namely the Beth Israel Deaconess Medical Centre (BIDMC) Congestive Heart Failure Database [28] and Fantasia Database [29].The Data-set B was obtained from the Intercity Digital ECG Alliance (IDEAL) study of the University of Rochester Medical Center Telemetric and Holter ECG Warehouse (THEW) archives [30].The details of ECG signals obtained from various databases is presented in Table 6.The BIDMC database contains ECGs from 15 patients with CHF, classified according to the NYHA classification standard, without distinguishing between NYHA classes III and IV.The Fantasia database includes ECGs from 18 healthy individuals.The THEW database contains ECGs from 50 patients with CHF, categorized into 1-4 severity grades, although the classification standard used for this categorization are not explicitly stated.
We used Data-set A (BIDMC + Fantasia) to perform a binary test for diagnosis of heart failure in patients with our model, and Data-set B (THEW) to perform a separate four-class classification test for assessment of heart failure severity in patients with our CNN-LSTM-SE model alone.The results are shown in Table 7. From Table 7, it can be seen that the binary classification model using Data-set A achieved an accuracy of 99.35%, precision of 99.35%, sensitivity of 99.37%, and specificity of 99.37%.The four-class classification model using Dataset B achieved the Acc of 98.91%, PPV of 98.39%, Sen of 99.06%, and Spe of 99.57%.Except for the Acc (98.91%) and PPV (98.39%) of the model using Data-set B, all other metrics of the proposed models constructed using Datasets A and B are above 99%.The CNN-LSTM-SE model proposed in this paper also performs well on above two datasets, indicating that our model has strong robustness.
To further verify the performance of the proposed CNN-LSTM-SE model, the proposed model is compared with other existing heart failure classification methods (e.g.SVM, CNN, Natural Language Processing(NLP), Resnet, etc.).The performance indicators of each model are shown in Table 8.The current research on the classification of heart failure mainly includes two-, three-,      Most heart failure classification techniques using deep learning largely rely on CNN for extracting the spatial features of ECG, neglecting the temporal characteristics.This paper presents an alternative method that incorporates LSTM to capture sequential features of ECG signal and the attention mechanism to focus important features associated with heart failure.Therefore, the effect of our CNN-LSTM-SE model is better than that of literature [9] and literature [12].For the five-grades heart failure classification problem, the Acc of heart failure classification obtained by the CNN-RNN [11] model was 97.6%.The model focuses on both temporal and spatial features of the ECG, but the method proposed in this paper incorporates attention mechanisms to make the model more focused on key features related to heart failure, so the performance of our CNN-LSTM-SE model is better than the CNN-RNN model.The literature [11] only discussed the effect of dividing ECG according to 2 s and 5 s, while we discusses the impact of varying ECG segment lengths on heart failure classification and reveals that the 12 s ECG segment results in optimal accuracy.Our model is designed to tackle the four-grades heart failure classification problem, has yielded noteworthy results.We analyzed the data used in this experiment and visualized the results of ECG signal analysis.The violin diagram [32] of the ECG amplitude for each severity level of heart failure is shown in Fig. 9.The amplitude distribution of ECGs according to the severity of heart failure is more intuitively understood by observing the violin diagram.As shown in Fig. 9, the ECG signal amplitudes of NYHA Class I are all concentrated between 0 and 1.The amplitudes of the ECGs of NYHA Classes II, III, and IV are relatively dispersed, with the amplitudes of the ECGs of NYHA Class II being between − 2 and 2, of NYHA Class III being between − 2 and 2.8, and of NYHA Class IV being between − 2.8 and 2.2.However, the amplitudes of ECGs of NYHA Classes II, III, and IV are mainly concentrated between 0 and 1, except for a few distributed outliers.The distribution of the four categories is similar, with the maximum distribution around 0.5 and the number of distributions gradually decreasing to 0 and 1.In this case, some simple characteristics, such as amplitude, cannot be relied on to distinguish the type of heart failure.Therefore, building a deep learning model to distinguish between the four levels is necessary.
In addition, to enhance the interpretability of our model, we applied gradient-weighted class activation mapping (Grad-CAM) to obtain the heat maps of the last convolutional layers to highlight the area of the model's focus.To visualize them, we displayed the heat maps for all four grades of heart failure.Figure 10 shows the heat maps of ECGs in heart failure NYHA Class I-IV, which are overlaid with heat maps of the last convolution layer calculated by the Grad-CAM method.The color bar ranging from blue to red indicating the degree of model attention, from low to high.From Fig. 10(1), it can be observed that the model focuses on the QRS of the ECG.Moreover, in Fig. 10(2)-( 4), it is evident that the model predominantly concentrates on the ST segment of the ECG, which is known to exhibit abnormal changes in the ECG of heart failure patients [33].As the disease progresses, the changes in the ST-T segment (the region of the ST and T waves) become more pronounced, which has a strong correlation with the severity of heart failure and serves as a reliable indicator.We can see that the ST-T segment of most ECGs is more red than other segments, and the results show that the model pays more attention to the ST-T segment location of the characteristic ECGs, which has some indicative effect on the decision of the assistant clinician.
The above experimental results show that our deep learning model simultaneously extracts the spatial and temporal characteristics of the ECGs of patients with heart failure.The model focuses on the key features of the signals by incorporating the attention mechanism.These results show that the proposed model achieves a good classification result and that its comprehensive performance is better than similar methods.

Conclusion
This paper proposes a deep learning model, CNN-LSTM-SE.The model uses a CNN, LSTM, and integrating attention mechanism.This model classifies heart failure into four levels automatically according to the ECG data of patients with heart failure.
We used a CNN to extract the spatial characteristics of ECGs.LSTM obtained the time series characteristics of ECGs.The attention mechanism was incorporated into the model to focus on the key features of ECGs to The datasets constructed with 12 s ECG signal segmentation provided the best classification with the proposed model.The comprehensive performance of the deep learning model described in this paper is better than the current shallow machine learning and similar deep learning models.It can assist medical staff in clinical diagnosis and has good application prospects.In medicine, all kinds of heart diseases need to process and analyze ECGs [34][35][36][37].Therefore, this method is not limited to the field of heart failure classification, but can also be extended to other fields such as arrhythmia [38][39][40] and coronary artery disease [41][42][43][44].
The limitations of our CNN-LSTM-SE model are as follows: 1.The ECG segments input by the model should contain at least one complete ECG beat (P wave, PR segment [45][46][47], QRS complex, ST-T segment, U wave) to ensure more accurate classification results of the model.From the interpretability visualization results of the model, it can be known that if the input ECG segment does not contain a complete ECG beat, it may lead to the loss of some important features associated with four grades of heart failure, which affects the decision results of the model.2. Our model belongs to the monomodal method based on ECGs for heart failure classification, without considering other clinical health data of heart failure patients, and there is still room for improvement in classification performance.
The further work based on the proposed model are as follows: 1.The proposed model is developed using imbalance dataset, we will work with hospitals to improve existing datasets, especially by adding data for NYHA Class I patients, to further refine the model's performance.2. Multimodal [48] network will be constructed to classify heart failure.On the basis of the deep learning model based on monomodal data in this paper, patient data from other modalities related to heart failure will be added to further improve the objectivity of heart failure classification results and the interpretability of related diseases.For example, adding clinical indicators such as blood pressure and blood glucose of patients to the model proposed in this paper can further explore the relationship between heart disease and underlying diseases [49] (such as hypertension, hyperglycemia, etc.).

1 .
Construct a deep learning model for heart failure classification using CNN and Long short-term memory (LSTM) to extract the spatial and temporal characteristics of the ECGs of patients with heart failure, and incorporate the attention mechanism to make the model focus on the key features of ECGs in patients with heart failure automatically.2. The CNN-LSTM-SE model proposed in this paper has the characteristics of simple structure and lightweight.Noise filtering, feature extraction and selection techniques are not required.3. Discuss the effect of different length ECGs of patients with heart failure on heart failure classification, and find out the best partition.Train and verify the performance of the proposed CNN-LSTM-SE deep learning model that automatically divides cases of heart failure into four categories according to the NYHA classification standard based on the best ECG segment signals of patients with heart failure.4. Conduct an interpretability analysis of the proposed deep learning model, overlaying the ECG with the heat maps generated using Gradient-weighted Class Activation Mapping (Grad-CAM) for visualization.By comparing ECGs of 4 different severity grades of heart failure, it was observed that for NYHA Class I ECG, the proposed model mainly focus on the QRS segment.For NYHA Class II-IV heart failure, the proposed model's attention is mostly concentrated on the ST-T segment.This has some indicative effect on the decision of the assistant clinician.5.The proposed model in this paper has been tested on different datasets of heart failure and achieved good results, indicating that the proposed model has good robustness.

Fig. 1
Fig.1The framework of our method

Fig. 6
Fig. 6 Architecture diagram of the proposed CNN-LSTM-SE model

Fig. 7
Fig. 7 Accuracy and loss plots for the various schemes during training four-and five-grades classification.Traditional shallow machine learning methods (e.g.SVM, CART, Adaboost, etc.) are mostly used to model the two-grades and threegrades studies of heart failure classification.However, the limitations inherent in shallow machine learning, such as manual feature extraction and inherent model characteristics, make it difficult to achieve high accuracy rates in heart failure classification.The Acc of the heart failure classification of the machine learning model is around

Fig. 8
Fig. 8 The confusion matrixes of the CNN-LSTM-SE model divided into one segment by every 12 s

Fig. 9 17 Fig. 10
Fig. 9 Violin diagram of ECG amplitudes for four severe levels of heart failure

Data-set establishment Based on the
MIMIC-III v1.4 database, heart failure classification is studied by combining deep learning with ECG signal.First, all ICD-9 codes relevant to heart failure was identified from the DIAGNOSES_ICD table within the data set.A total of 25 codes for heart failure conditions were found in the table, including: congestive heart failure, systolic heart failure, diastolic heart failure and so on.Patients' diagnosis results were recorded in DRGCODES.csvfile of the MIMIC-III data set.A total of 10,436 patients with heart failure were screened from DRGCODES.csvfileaccording to ICD-9 coding, among which 644 patients with heart failure were labeled with NYHA grading results.Finally, by cross-referencing patient IDs, multi-lead ECG data was collected from the waveform data set for 268 heart failure patients.Not every one of these 268 patients had a complete multilead ECG.For data consistency, we used the lead II ECG as the data set for this article.The resulting severity grading distribution of heart failure is presented in Table1, while examples of the ECGs of the four NYHA

Table 1
Data used in this study Fig. 2 Example ECGs for different classes

Table 2
Summary of the amounts of data segmented by different durations Internal structure of LSTM block Zhang et al.BMC Medical Informatics and Decision Making (2024) 24:17 where K f , K i , K o , and K c represent the weight matrix corresponding to the amnesia gate, input gate, output gate, and neuron state matrix, respectively, and Z f , Z i , Z o , and Z c represent the offset for each door.

Table 3
Comparison of different model performance on ECG datasets divided by 12 s

Table 4
Performance comparison of CNN-LSTM-SE model on ECG datasets divided by different durations

Table 5
[12]-fold cross-validation of CNN-LSTM-SE model ] adopted the NLP method, and the patient's clinical data was used as the input of the model.The Ppv of the model was 94.99%.Li et al.[12]improved ResNet-34 by adding multi-scale residual block to the Resnet-34.The Acc of heart failure classification obtained by the above model reached 94.29%, and the Ppv was 94.16%.

Table 6
The details of ECG signals obtained from various databases

Table 7
Results on the data-sets A and B

Table 8
Summary of performance comparison for different methods